A Bayesian Approach to Filtering Junk E-Mail

نویسندگان

  • Mehran Sahami
  • Susan Dumais
  • David Heckerman
  • Eric Horvitz
چکیده

In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream. By casting this problem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunction with a notion of differential misclassification cost to produce filters Which are especially appropriate for the nuances of this task. While this may appear, at first, to be a straight-forward text classification problem, we show that by considering domain-specific features of this problem in addition to the raw text of E-mail messages, we can produce much more accurate filters. Finally, we show the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzing E-mail Filters with Generative Grammars and N-Gram Analysis

Phishing attacks remain a common attack vector in today’s IT threat landscape, and one of the primary means of preventing phishing attacks is e-mail filtering. Most e-mail filtering is done according to a either a signaturebased approach or using Bayesian models, so when specific signatures are detected the e-mail is either quarantined or moved to a Junk mailbox. Much like antivirus, though, a ...

متن کامل

Spam / Junk E-Mail Filter Technique

Most e-mail readers spend a significant amount of time regularly deleting junk e-mail (spam) messages, which are a part of marketing campaigning efforts of various companies wherein users normally signed in and it also results in increasing volume of storage space and consumes network bandwidth. A challenge, therefore, rests with the developers and improvement of automatic classifiers that can ...

متن کامل

Spam / Junk E-Mail Filter Technique

Most e-mail readers spend a significant amount of time regularly deleting junk e-mail (spam) messages, which are a part of marketing campaigning efforts of various companies wherein users normally signed in and it also results in increasing volume of storage space and consumes network bandwidth. A challenge, therefore, rests with the developers and improvement of automatic classifiers that can ...

متن کامل

Spam / Junk E-Mail Filter Technique

Most e-mail readers spend a significant amount of time regularly deleting junk e-mail (spam) messages, which are a part of marketing campaigning efforts of various companies wherein users normally signed in and it also results in increasing volume of storage space and consumes network bandwidth. A challenge, therefore, rests with the developers and improvement of automatic classifiers that can ...

متن کامل

Spam / Junk E-Mail Filter Technique

Most e-mail readers spend a significant amount of time regularly deleting junk e-mail (spam) messages, which are a part of marketing campaigning efforts of various companies wherein users normally signed in and it also results in increasing volume of storage space and consumes network bandwidth. A challenge, therefore, rests with the developers and improvement of automatic classifiers that can ...

متن کامل

Spam / Junk E-Mail Filter Technique

Most e-mail readers spend a significant amount of time regularly deleting junk e-mail (spam) messages, which are a part of marketing campaigning efforts of various companies wherein users normally signed in and it also results in increasing volume of storage space and consumes network bandwidth. A challenge, therefore, rests with the developers and improvement of automatic classifiers that can ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998